Search CORE

10 research outputs found

Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance

Author: Couturier Camille
Rajmohan Saravan
Ruhle Victor
Xia Menglin
Zhang Xuchao
Zheng Guoqing
Publication venue
Publication date: 08/08/2023
Field of study

Retrieval augmented models show promise in enhancing traditional language models by improving their contextual understanding, integrating private data, and reducing hallucination. However, the processing time required for retrieval augmented large language models poses a challenge when applying them to tasks that require real-time responses, such as composition assistance. To overcome this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework that leverages a hybrid setting that combines both client and cloud models. HybridRAG incorporates retrieval-augmented memory generated asynchronously by a Large Language Model (LLM) in the cloud. By integrating this retrieval augmented memory, the client model acquires the capability to generate highly effective responses, benefiting from the LLM's capabilities. Furthermore, through asynchronous memory integration, the client model is capable of delivering real-time responses to user requests without the need to wait for memory synchronization from the cloud. Our experiments on Wikitext and Pile subsets show that HybridRAG achieves lower latency than a cloud-based retrieval-augmented LLM, while outperforming client-only models in utility

arXiv.org e-Print Archive

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

Author: Bansal Chetan
Fonseca Rodrigo
Las-Casas Pedro
Rajmohan Saravan
Zhang Dylan
Zhang Xuchao
Publication venue
Publication date: 29/09/2023
Field of study

Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents. Despite the growing prevalence of AI-driven assistants in the root cause analysis process, their effectiveness in assisting on-call engineers is constrained by low accuracy due to the intrinsic difficulty of the task, a propensity for LLM-based approaches to hallucinate, and difficulties in distinguishing these well-disguised hallucinations. To address this challenge, we propose to perform confidence estimation for the predictions to help on-call engineers make decisions on whether to adopt the model prediction. Considering the black-box nature of many LLM-based root cause predictors, fine-tuning or temperature-scaling-based approaches are inapplicable. We therefore design an innovative confidence estimation framework based on prompting retrieval-augmented large language models (LLMs) that demand a minimal amount of information from the root cause predictor. This approach consists of two scoring phases: the LLM-based confidence estimator first evaluates its confidence in making judgments in the face of the current incident that reflects its ``grounded-ness" level in reference data, then rates the root cause prediction based on historical references. An optimization step combines these two scores for a final confidence assignment. We show that our method is able to produce calibrated confidence estimates for predicted root causes, validate the usefulness of retrieved historical data and the prompting strategy as well as the generalizability across different root cause prediction models. Our study takes an important move towards reliably and effectively embedding LLMs into cloud incident management systems

arXiv.org e-Print Archive

Solving the Batch Stochastic Bin Packing Problem in Cloud: A Chance-constrained Optimization Approach

Author: Chen Liting
Fang Yixin
Lin Qingwei
Lu Yunlei
Moscibroda Thomas
Qin Si
Rajmohan Saravan
Yan Jie
Zhang Dongmei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/07/2022
Field of study

This paper investigates a critical resource allocation problem in the first party cloud: scheduling containers to machines. There are tens of services and each service runs a set of homogeneous containers with dynamic resource usage; containers of a service are scheduled daily in a batch fashion. This problem can be naturally formulated as Stochastic Bin Packing Problem (SBPP). However, traditional SBPP research often focuses on cases of empty machines, whose objective, i.e., to minimize the number of used machines, is not well-defined for the more common reality with nonempty machines. This paper aims to close this gap. First, we define a new objective metric, Used Capacity at Confidence (UCaC), which measures the maximum used resources at a probability and is proved to be consistent for both empty and nonempty machines, and reformulate the SBPP under chance constraints. Second, by modeling the container resource usage distribution in a generative approach, we reveal that UCaC can be approximated with Gaussian, which is verified by trace data of real-world applications. Third, we propose an exact solver by solving the equivalent cutting stock variant as well as two heuristics-based solvers -- UCaC best fit, bi-level heuristics. We experimentally evaluate these solvers on both synthetic datasets and real application traces, demonstrating our methodology's advantage over traditional SBPP optimal solver minimizing the number of used machines, with a low rate of resource violations.Comment: To appear in SIGKDD 2022 as Research Track pape

arXiv.org e-Print Archive

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

Author: Ding Ruomeng
Lin Qingwei
Ma Minghua
Qin Si
Rajmohan Saravan
Wang Lu
Xu Yong
Zhang Chaoyun
Zhang Dongmei
Zhang Wei
Publication venue
Publication date: 12/11/2023
Field of study

Recent advancements in Large Language Models (LLMs) have revolutionized decision-making by breaking down complex problems into more manageable language sequences referred to as ``thoughts''. An effective thought design should consider three key perspectives: performance, efficiency, and flexibility. However, existing thought can at most exhibit two of these attributes. To address these limitations, we introduce a novel thought prompting approach called ``Everything of Thoughts'' (XoT) to defy the law of ``Penrose triangle of existing thought paradigms. XoT leverages pretrained reinforcement learning and Monte Carlo Tree Search (MCTS) to incorporate external domain knowledge into thoughts, thereby enhancing LLMs' capabilities and enabling them to generalize to unseen problems efficiently. Through the utilization of the MCTS-LLM collaborative thought revision framework, this approach autonomously produces high-quality comprehensive cognitive mappings with minimal LLM interactions. Additionally, XoT empowers LLMs to engage in unconstrained thinking, allowing for flexible cognitive mappings for problems with multiple solutions. We evaluate XoT on several challenging multi-solution problem-solving tasks, including Game of 24, 8-Puzzle, and Pocket Cube. Our results demonstrate that XoT significantly outperforms existing approaches. Notably, XoT can yield multiple solutions with just one LLM call, showcasing its remarkable proficiency in addressing complex problems across diverse domains.Comment: 17 pages, 5 figure

arXiv.org e-Print Archive

ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

Author: Chen Yuhang
Ding Ruomeng
He Shilin
Li Bowen
Lin Qingwei
Liu Yudong
Ma Minghua
Rajmohan Saravan
Zhang Chaoyun
Zhang Dongmei
Publication venue
Publication date: 14/11/2023
Field of study

Anomaly detection in multivariate time series data is of paramount importance for ensuring the efficient operation of large-scale systems across diverse domains. However, accurately detecting anomalies in such data poses significant challenges. Existing approaches, including forecasting and reconstruction-based methods, struggle to address these challenges effectively. To overcome these limitations, we propose a novel anomaly detection framework named ImDiffusion, which combines time series imputation and diffusion models to achieve accurate and robust anomaly detection. The imputation-based approach employed by ImDiffusion leverages the information from neighboring values in the time series, enabling precise modeling of temporal and inter-correlated dependencies, reducing uncertainty in the data, thereby enhancing the robustness of the anomaly detection process. ImDiffusion further leverages diffusion models as time series imputers to accurately capturing complex dependencies. We leverage the step-by-step denoised outputs generated during the inference process to serve as valuable signals for anomaly prediction, resulting in improved accuracy and robustness of the detection process. We evaluate the performance of ImDiffusion via extensive experiments on benchmark datasets. The results demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in terms of detection accuracy and timeliness. ImDiffusion is further integrated into the real production system in Microsoft and observe a remarkable 11.4% increase in detection F1 score compared to the legacy approach. To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.Comment: To appear in VLDB 2024.Code: https://github.com/17000cyh/IMDiffusion.gi

arXiv.org e-Print Archive

Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

Author: Dong Hang
Jin Bo
Lin Qingwei
Qiao Bo
Qin Si
Rajmohan Saravan
Sheng Junjie
Wang Jun
Wang Lu
Wang Xiangfeng
Yang Fangkai
Zhang Dongmei
Publication venue
Publication date: 21/11/2022
Field of study

Oversubscription is a common practice for improving cloud resource utilization. It allows the cloud service provider to sell more resources than the physical limit, assuming not all users would fully utilize the resources simultaneously. However, how to design an oversubscription policy that improves utilization while satisfying the some safety constraints remains an open problem. Existing methods and industrial practices are over-conservative, ignoring the coordination of diverse resource usage patterns and probabilistic constraints. To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and propose an effective Chance Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem. Specifically, C2MARL reduces the number of constraints by considering their upper bounds and leverages a multi-agent reinforcement learning paradigm to learn a safe and optimal coordination policy. We evaluate our C2MARL on an internal cloud platform and public cloud datasets. Experiments show that our C2MARL outperforms existing methods in improving utilization (

20\%\sim 86\%

) under different levels of safety constraints

arXiv.org e-Print Archive

Diffusion-based Time Series Data Imputation for Microsoft 365

Author: Björkman Mårten
Li Tianci
Lin Qingwei
Liu Bo
Liu Yudong
Qiao Bo
Rajmohan Saravan
Wang Lu
Wang Paul
Yang Fangkai
Yin Wenjie
Zhang Dongmei
Zhao Pu
Publication venue
Publication date: 03/08/2023
Field of study

Reliability is extremely important for large-scale cloud systems like Microsoft 365. Cloud failures such as disk failure, node failure, etc. threaten service reliability, resulting in online service interruptions and economic loss. Existing works focus on predicting cloud failures and proactively taking action before failures happen. However, they suffer from poor data quality like data missing in model training and prediction, which limits the performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently based on the observed data. Our experiments and application practice show that our model contributes to improving the performance of the downstream failure prediction task

arXiv.org e-Print Archive

TraceDiag: Adaptive, Interpretable, and Efficient Root Cause Analysis on Large-Scale Microservice Systems

Author: Chen Qingjun
Ding Ruomeng
Fan Hao
Gao Xin
Gao Xuedong
Lin Qingwei
Ma Minghua
Rajmohan Saravan
Wang Lu
Wu Xiaomin
Xu Yong
Zhang Chaoyun
Zhang Dongmei
Zhang Meng
Publication venue
Publication date: 28/10/2023
Field of study

Root Cause Analysis (RCA) is becoming increasingly crucial for ensuring the reliability of microservice systems. However, performing RCA on modern microservice systems can be challenging due to their large scale, as they usually comprise hundreds of components, leading significant human effort. This paper proposes TraceDiag, an end-to-end RCA framework that addresses the challenges for large-scale microservice systems. It leverages reinforcement learning to learn a pruning policy for the service dependency graph to automatically eliminates redundant components, thereby significantly improving the RCA efficiency. The learned pruning policy is interpretable and fully adaptive to new RCA instances. With the pruned graph, a causal-based method can be executed with high accuracy and efficiency. The proposed TraceDiag framework is evaluated on real data traces collected from the Microsoft Exchange system, and demonstrates superior performance compared to state-of-the-art RCA approaches. Notably, TraceDiag has been integrated as a critical component in the Microsoft M365 Exchange, resulting in a significant improvement in the system's reliability and a considerable reduction in the human effort required for RCA

arXiv.org e-Print Archive

Assess and Summarize: Improve Outage Understanding with Large Language Models

Author: Dang Yingnong
He Shilin
Jin Pengxiang
Kang Yu
Li Haozhe
Li Liqun
Lin Qingwei
Liu Yudong
Ma Minghua
Qiao Bo
Rajmohan Saravan
Sarro Federica
Zhang Chaoyun
Zhang Dongmei
Zhang Shenglin
Zhao Pu
Publication venue
Publication date: 29/05/2023
Field of study

Cloud systems have become increasingly popular in recent years due to their flexibility and scalability. Each time cloud computing applications and services hosted on the cloud are affected by a cloud outage, users can experience slow response times, connection issues or total service disruption, resulting in a significant negative business impact. Outages are usually comprised of several concurring events/source causes, and therefore understanding the context of outages is a very challenging yet crucial first step toward mitigating and resolving outages. In current practice, on-call engineers with in-depth domain knowledge, have to manually assess and summarize outages when they happen, which is time-consuming and labor-intensive. In this paper, we first present a large-scale empirical study investigating the way on-call engineers currently deal with cloud outages at Microsoft, and then present and empirically validate a novel approach (dubbed Oasis) to help the engineers in this task. Oasis is able to automatically assess the impact scope of outages as well as to produce human-readable summarization. Specifically, Oasis first assesses the impact scope of an outage by aggregating relevant incidents via multiple techniques. Then, it generates a human-readable summary by leveraging fine-tuned large language models like GPT-3.x. The impact assessment component of Oasis was introduced in Microsoft over three years ago, and it is now widely adopted, while the outage summarization component has been recently introduced, and in this article we present the results of an empirical evaluation we carried out on 18 real-world cloud systems as well as a human-based evaluation with outage owners. The results show that Oasis can effectively and efficiently summarize outages, and lead Microsoft to deploy its first prototype which is currently under experimental adoption by some of the incident teams

arXiv.org e-Print Archive

Automatic Root Cause Analysis via Large Language Models for Cloud Incidents

Author: Cao Yunjie
Chen Yinfang
Fan Hao
Gao Xin
Gao Xuedong
Ghosh Supriyo
Kang Yu
Lin Qingwei
Ma Minghua
Rajmohan Saravan
Shi Liu
Wen Ming
Xie Huaibing
Xu Tianyin
Zeng Jun
Zhang Chaoyun
Zhang Dongmei
Zhang Xuchao
Publication venue
Publication date: 13/11/2023
Field of study

Ensuring the reliability and availability of cloud services necessitates efficient root cause analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual investigations of data sources such as logs and traces, are often laborious, error-prone, and challenging for on-call engineers. In this paper, we introduce RCACopilot, an innovative on-call system empowered by the large language model for automating RCA of cloud incidents. RCACopilot matches incoming incidents to corresponding incident handlers based on their alert types, aggregates the critical runtime diagnostic information, predicts the incident's root cause category, and provides an explanatory narrative. We evaluate RCACopilot using a real-world dataset consisting of a year's worth of incidents from Microsoft. Our evaluation demonstrates that RCACopilot achieves RCA accuracy up to 0.766. Furthermore, the diagnostic information collection component of RCACopilot has been successfully in use at Microsoft for over four years

arXiv.org e-Print Archive